LOT: A Story-Centric Benchmark for Evaluating Chinese Long Text Understanding and Generation

نویسندگان

چکیده

Abstract Standard multi-task benchmarks are essential for developing pretraining models that can generalize to various downstream tasks. Existing natural language processing (NLP) usually focus only on understanding or generating short texts. However, long text modeling requires many distinct abilities in contrast texts, such as the of long-range discourse and commonsense relations, coherence controllability generation. The lack standardized makes it difficult assess these a model fairly compare different models, especially Chinese models. Therefore, we propose story-centric benchmark named LOT evaluating modeling, which aggregates two tasks generation We construct new datasets based human-written stories with hundreds words. Furthermore, release an encoder-decoder-based LongLM up 1 billion parameters. pretrain 120G novels generative including infilling conditional continuation. Extensive experiments show outperforms similar-sized substantially both LOT.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Extending and Evaluating a Platform for Story Understanding

We summarize recent developments in our platform for symbolically representing and reasoning over human narratives. The expressive range of the system is bolstered by the infusion of a large library of knowledge frames, including verbs, adjectives, nouns and adverbs, from external linguistic resources. Extensions to the model itself include alternate timelines (imagined states for goals, plans,...

متن کامل

Evaluating Text Understanding Systems

The Naval Ocean Systems Center is extending the scope of previous efforts in the area of evaluating English text analysis systems and is seeking to refine the methodology in order to obtain per formance benchmarks on an informat ion ext ract ion task for recall , precision, overgeneration, and fallout for a variety of systems. The methodology is also intended to enable the collection of qualita...

متن کامل

Evaluating Generative Models for Text Generation

Generating human quality text is a challenging problem because of ambiguity of meaning and difficulty in modeling long term semantic connections. Recurrent Neural Networks (RNNs) have shown promising results in this problem domain, with the most common approach to its training being to maximize the log predictive likelihood of each true token in the training sequence given the previously observ...

متن کامل

Meaning-Centric Framework for Natural Text/Scene Understanding by Robots

In the past fifty years, efforts in classical AI have focussed on computerizing human intelligence. Naturally, computerized human intelligence is not a proof of machine or robot intelligence because the programs underlying computerized human intelligence are still made by humans. So far, there is no computer nor robot which is creative enough to master its own language and to compose a text exp...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Transactions of the Association for Computational Linguistics

سال: 2022

ISSN: ['2307-387X']

DOI: https://doi.org/10.1162/tacl_a_00469